Skip to main content

Native Java Library

Prerequisites

pdf2Data Java SDK requires Java 8, Java 11 or Java 17 to be installed on your system.

We guarantee software compatibility with the Oracle JRE 8 and Open JRE 11/17.

We recommend using at least 1.5GB of Java heap space, and 500MB per each additional thread.

System Requirements

  • Recommended minimal hardware configuration:
    • 2 core CPU
    • Memory: 2 GB
    • Temp storage: 2 GB free disk space

While the Java SDK will work fine on a single core, we recommend using multiple cores in cases where you handle documents in parallel using separate threads (one document per thread).

Installation

The preferred way to set up iText pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory

The groupId is com.itextpdf.pdf2data, and the artifactId is pdf2data

In Maven, the configuration would look similar to the example below:

Maven

Add the pdf2Data repository to the <repositories> section.

<repositories>
<repository>
<id>pdf2Data</id>
<name>pdf2Data Maven Repository</name>
<url>https://repo.itextsupport.com/pdf2data</url>
</repository>
<repository> <!-- can be skipped if license is unlimited or local reporting is going to be configured -->
<id>itext-releases</id>
<name>iText Repository-releases</name>
<url>https://repo.itextsupport.com/releases</url>
</repository>
</repositories>

And dependency to <dependencies>

<dependencies>
<dependency>
<groupId>com.itextpdf.pdf2data</groupId>
<artifactId>pdf2data</artifactId>
<version>4.5.0</version>
</dependency>
<dependency> <!-- can be skipped if license is unlimited or local reporting is going to be configured -->
<groupId>com.itextpdf.licensing</groupId>
<artifactId>licensing-remote</artifactId>
<version>4.0.5</version>
</dependency>
<dependencies>

Using pdf2Data from your code

As from pdf2Data 4.0, the format of extraction templates has been changed, compared to pdf2Data 3.*. Please see the Migration guide to get to know more

With the pdf2Data UI (pdf2Data 4.0+), you can download templates optimized for use in the pdf2Data SDK.

1. Load the pdf2Data license

Make sure to load the license file before invoking any code

LicenseKey.loadLicenseFile(pathToLicenseFile);

2. Create an extractor

pdf2Data extractor can be created using an extraction template downloaded from pdf2Data UI

The initialization of the Pdf2DataExtractor instance from a processed template should now be done with one function call:

Pdf2DataExtractor extractor = Pdf2DataExtractor.create(new File(P2D_TEMPLATE_PATH));
tip

The extractor can be re-used multiple times, to process batch of pdf files in the loop

3. Extract data from PDF

RecognitionResultHolder result = extractor.recognizeOnPdf(new File(PDF_PATH));

You can use extracted values directly from the result or save them in one of two structured formats

4. Get results for specific data field

You can get all results as sorted map by calling:

SortedMap<String, DataFieldResult> allResults = result.getDataFieldResults();

To get results for specific data field use this call:

List<AbstractValueResult> dataFieldResult = allResults.get(DATAFIELD_NAME).getResults();

Results objects have similar structure to described in Recognition result specification, you can also consult SDK JavaDocs.

5. Save extracted data

tip

By default, your data will be saved without metadata. To include it in the result, you should use method overloads with passing next SerializationProperties:

SerializationProperties properties = new SerializationProperties().setIncludeMetaData(true);
XML
// If you want to write results directly into file.
result.writeToXml(new File(RESULT_XML_PATH));

// writing result directly to HTTP response
result.writeToXml(response.getOutputStream()); // any other OutputStream implementation can be passed here

To save result with metadata

// save to file
result.writeToXml(new File(RESULT_XML_PATH), properties);

// writing result directly to HTTP response
result.writeToXml(response.getOutputStream(), properties); // any other OutputStream implementation can be passed here
JSON
// If you want to write results directly into file.
result.writeToJson(new File(RESULT_JSON_PATH));

// writing result directly to HTTP response
result.writeToJson(response.getOutputStream()); // any other OutputStream implementation can be passed here

To save result with metadata

// save to file
result.writeToJson(new File(RESULT_JSON_PATH), properties);

// writing result directly to HTTP response
result.writeToJson(response.getOutputStream(), properties); // any other OutputStream implementation can be passed here

Full code sample

LicenseKey.loadLicenseFile(pathToLicenseFile);

Pdf2DataExtractor extractor = Pdf2DataExtractor.create(new File(P2D_TEMPLATE_PATH));
RecognitionResultHolder result = extractor.recognizeOnPdf(new File(PDF_PATH));

// If you want to write results directly into file.
result.writeToXml(new File(RESULT_XML_PATH));
result.writeToJson(new File(RESULT_JSON_PATH));

// writing result directly to HTTP response
result.writeToXml(response.getOutputStream()); // any other OutputStream implementation can be passed here
result.writeToJson(response.getOutputStream());

// If you want directly access result objects to further save to e.g. DB or other structured storage:
// all results:
SortedMap<String, DataFieldResult> allResults = result.getResult().getDataFieldResults();
List<AbstractValueResult> dataFieldResult = allResults.get(DATAFIELD_NAME).getResults();

Deprecated API

caution

Note that functions mentioned in samples above were introduced since 4.4.0 and will produce the results in new refined format. Versions before 5.0.0 will still contain legacy API which produces old result format but since 5.0.0 it is going to be dropped, so it is recommended to migrate and use new functions and format.